Skip to content

Develop#21

Merged
AperturePlus merged 8 commits into
masterfrom
develop
Mar 13, 2026
Merged

Develop#21
AperturePlus merged 8 commits into
masterfrom
develop

Conversation

@AperturePlus
Copy link
Copy Markdown
Owner

No description provided.

AperturePlus and others added 8 commits February 26, 2026 00:03
Why:
- Indexing against Ollama WordPiece-based embedding models can exceed model context limits because chunk sizing used only cl100k_base token counts.
- Ollama error payloads using "input length exceeds the context length" were not consistently classified as token-limit failures.

What:
- Added tokenizer strategies (tiktoken, character, simple) and strategy-based default tokenizer factory.
- Added ACI_TOKENIZER config/env support and wired tokenizer selection into service initialization so chunking and summary generation share the chosen tokenizer.
- Expanded token-limit error detection to include context-length style 400 responses.
- Updated unit/property tests for tokenizer strategy selection, offline-safe tokenizer tests, config strategy generation, and token-limit message coverage.

Test:
- uv run pytest tests/unit/test_tokenizer.py tests/property/test_embedding_client_properties.py tests/property/test_config_properties.py -q (pass)
- uv run ruff check src tests (pass)
- uv run pytest tests/ -v --tb=short -q --durations=10 --maxfail=1 (fails in this environment due proxy blocking tiktoken encoding download)
- uv run mypy src --ignore-missing-imports --no-error-summary (existing repo-wide mypy errors)
Why: MCP runtime path mappings were bypassed for absolute host paths on Windows, which broke the new MCP path-resolution flow and caused pytest failures. The latest MCP changes also expanded the context contract and exposed a flaky Hypothesis deadline in file scanner tests.

What: apply runtime path mappings before native absolute-path fallback, add runtime path resolution coverage for Windows and POSIX host paths, update MCP/qdrant/property tests to match the new contract, and disable the flaky deadline on the ignore-pattern property test.

Test: uv run ruff check src tests (passed)

Test: uv run pytest tests/ -v --tb=short -q --durations=10 (passed, 660 passed / 16 skipped)
…ch-with-ollama-models

fix(tokenizer): add configurable tokenizer strategies and improve token-limit detection
@AperturePlus AperturePlus merged commit 2f7eec6 into master Mar 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant